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Abstract 

Starting from the generalized exponential function exp K (x) = (\/1 + k 2 x 2 + kx) , with 



exp (x) = exp(x), proposed in Ref. [G. Kaniadakis, Physica A 296, 405 (2001)], the survival 
function P> (x) = exp K (— /?x Q ), where x £ R + , a,/3 > 0, and k £ [0,1), is considered in order 
to analyze the data on personal income distribution for Germany, Italy, and the United Kingdom. 
The above defined distribution is a continuous one-parameter deformation of the stretched expo- 
nential function P> (x) = exp (— (5x a ) — to which reduces as k approaches zero — behaving in very 
different way in the x — > and x — > oo regions. Its bulk is very close to the stretched exponen- 
tial one, whereas its tail decays following the power- law P>(x) ~ (2/?K)~ 1/K x~ a / K . This makes 
the fc-generalized function particularly suitable to describe simultaneously the income distribution 
among both the richest part and the vast majority of the population, generally fitting different 
curves. An excellent agreement is found between our theoretical model and the observational data 
on personal income over their entire range. 
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I. INTRODUCTION 



A renewed interest in studying the distribution of income has emerged over the last 
years in both the physics and economics communities The focus has been mostly put 
on empirical analysis of extensive datasets to infer the exact shape ofpersonal income 
distributions, and to design theoretical models that can reproduce them 2j. 

A natural starting point in this area of enquiry was the observation that the number of 
persons in a population whose incomes exceed x is often well approximated by Cx~ a , for 
some real C and some positive a, as Pareto argued over 100 years ago. However, 

theoretical and empirical work rapidly pointed out the fact that it is only in the upper tail 
of the income distribution that a Pareto-like behavior can be expected jfj], while the bulk of 
the income — held by the 95% or so of the population — is governed by a completely different 
law. Therefore, many recent papers within this literature have sought to characterize the 
distribution of income by a mixture of known statistical distributions, even if there is a dis- 
pute about what these distributions are: indeed, while it seems to be generally acknowledged 
that the top 1-5% of incomes follows the Pareto's law, an exact and unequivocal character- 



ization of the 
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ow to medium income region of the distribution is still evasive. For example, 
sl m, Ha, claim that this is lognormal, while according to Refs. 

20j the distribution of personal income for the majority of the population 
should follow the exponential law. 

In the present work we address the issue of data analysis related to the size distribution 
of income by adopting a statistical mechanics approach introduced by one of us in Refs. 
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25], based on the one-parameter generalization of the exponential function 
defined through 

/ \ i/k 

exp K (x) — ( \A + k 2 x 2 + kx J , x G R. (LI) 

The properties of the function exp re (x) G C°° (R) have been considered extensively in the 
literature. We recall briefly that in the k — > limit the function exp K (x) reduces to the 
ordinary exponential, i.e. exp (x) = exp (x), and for x — > — independently on the value of 
K — behaves very similarly with the ordinary exponential, holding for n 2 x 2 < 1 the following 
Taylor expansion 

2 3 

exp K (x) = 1 + X + y + (1 - k 2 ) |j. + . . . . (1.2) 
It is remarkable that the first three terms of the Taylor expansion are the same as the 



ordinary exponential. On the other hand, the most interesting property of exp K (x) for the 
applications in statistics is the power-law asymptotic behavior 

exp K (x) ~ |2kx| ±1// ' k ' . (1.3) 

The generalized logarithmic function ln K (x) G C°° (R + ) is defined as the inverse function 
of exp K (x), namely ln re (exp K x) = exp K (ln K x) = x, and is given by 

ln K (x) = ^— — . (1.4) 

Starting from the generalized logarithm, the new entropy 

S(f) = - (K (/)> = - f dxf (x) In* (f(x) (1.5) 



has been introduced, which can be written explicitly as 

S^J^ f^-J^ , (,. 6) 

being / (x) the probability distribution function. The latter entropy has the standard prop- 
erties of the ordinary Boltzmann-Shannon entropy (which recovers in the k — > limit): is 
thermodynamically-stable, is Lesche-stable, obeys the Khinchin axioms of continuity, max- 
imality, expandability and generalized additivity. 

After maximizing the entropy (11.61) under the proper constraints according to the Jaynes 
Maximum Entropy Principle of statistical mechanics, the probability distribution function 

/ \ ( E (x) - a\ , r 
p(x) = a exp K (1.7) 



Xk B T 



obtains, where 



.. x 1/2k 

1 — K 



\ = v^3^2 and a = j^Y^J . (1.8) 

For a particle system, x represents the particle velocity, E (x) the energy, \x the chemical 
potential, T the temperature, and ks the Boltzmann constant. 
Also the distribution function 

/(x) = iexp K (-/3x Q ) (1.9) 

has been considered to define both probability distribution functions (with Z a normalization 
constant) as well as cumulative distribution functions (with Z = 1). The distribution 
functions given by Eqs. (11.71) and (11.91) have been used to analyze also non-physical systems. 



The main result of the present effort is that the cumulative distribution function defined 
by Eq. fll.9j) can describe the whole spectrum of the size distribution of income, ranging 
from the low region to the middle region, and up to the power-law tail, pointing in this way 
toward a unified approach to the problem. 

The paper is organized as follows. In Sec. [Ill we consider the main properties of the 
^-generalized statistical distribution functions. In Sec. 1111} in order to asses the reliability 
of the proposed /t-distribution, we compare the theoretical curve with the census data for 
personal income in Germany, Italy, and the United Kingdom. Finally, in Sec. [IV] some 
concluding remarks are reported. 



II. THE k-GENERALIZED STATISTICAL DISTRIBUTION 



The ^-generalized Complementary Cumulative Distribution Function (CCDF) is given 



by 



[x 



ex.p K (-(3x a ), ier, (II. 1) 

being P> (x) the probability of finding the distribution variable with a value X greater than 
x. The income variable x is defined as x = zj (z), being z the absolute personal income and 
(z) its mean value. Then the dimensionless variable x represents the personal income in 
units of (z) . The constant j3 > is a characteristic scale, since its value determines the scale 
of the probability distribution: if f3 is large, then the distribution will be more concentrated; 



if (3 is small, then it will be more spread out (see FIG. 1(a) -(b)). The exponent a > 



quantifies the curvature (shape) of the distribution, which is less (more) pronounced for 



lower (higher) values of the parameter, as seen in FIG. 2(a) - (b) Finally, as one can observe 



in FIG. 3(a) -(b), the deformation parameter k e [0, 1) measures the fatness of the upper 



tail: the larger (smaller) its magnitude, the fatter (thinner) the tail. 



The function P> (x) defined 
ordinary stretched exponential 



through Eq. ( III. II) can be viewed as a generalization of the 



261 ]. i.e. P> (x) = exp (— (3x a ), which recovers in the k — > 



limit. It is remarkable that P> (x) for x — > + behaves as the ordinary stretched exponential 



x 



+0+ 



exp (—[3x c 



while for x — > oo presents a power-law tail 



P>(x) 



(2/3k) 



-1/k x -(x/k 



(11.2) 



(11.3) 
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(a) CCDF for some different values of (3 



(b) PDF for some different values of (3 



FIG. 1: (a) Plot of the K-generalized CCDF given by Eq. (1IL1|) versus x for some different values 
of (= 0.20,0.40,0.60,0.80), and fixed a (= 2.50) and k (= 0.75). [(b)] Plot of the K-generalized 
PDF given by Eq. (|II.4|) versus x for some different values of (3 (= 0.20,0.40,0.60,0.80), and fixed 
a (= 2.50) and k (= 0.75). Notice that the distribution spreads out (concentrates) as the value of 
(3 decreases (increases) 



The Probability Density Function (PDF), p(x) = — dP> (x) /dx, is given by 

a(5x a ~ l exp K (— (5x a ) 



p [x) 



y/l + (3 2 K 2 X 2a 



(II.4) 



and can viewed as a generalization of the Weibull distribution 27J], i.e. p° (x) = 
af3x a ~ 1 exp (— (3x a ), which recovers in the k — > limit. The function p(x) given by Eq. 
(111.41) for x — > + behaves as a Weibull distribution 



p{x) ~ c^x"" 1 exp (-/?x a ) 



while for x — > +oo reduces to the Pareto's law 



p{x) ~ -(2(3k)~ 1/k x~^ +1 ). 

x— >+oo K 



(II.5) 



(II.6) 



Starting from the law (111.4j) . one can calculate the mean value (x) which, taking into 
account the meaning of the variable x, results to be equal to unity 



xp (x) dx = 1. 



(II.7) 
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(a) CCDF for some different values of a 



(b) PDF for some different values of a 



FIG. 2: (a) Plot of the K-generalized CCDF given by Eq. 01, lh versus x for some different values 
of a (= 1.00,2.00,2.50,3.00), and fixed (= 0.20) and n (= 0.75). [(b)] Plot of the K-generalized 
PDF given by Eq. versus x for some different values of a (= 1.00, 2.00, 2.50, 3.00), and fixed 

P (= 0.20) and k (= 0.75). Notice that the curvature (shape) of the distribution becomes less 
(more) pronounced when the value of a decreases (increases). The case a = 1.00 corresponds to 
the ordinary exponential function 



The latter relationship permits to express the parameter (3 as a function of the parameters 
k and a, obtaining 



P 



2 k\ 



i\ r 



2\k\ 2a 



\k\ + a p ( i I i 

1 \^2|k| ~T 2a 



(II.8) 



where T (x) is the Euler gamma function T (x) = J °° t x ~ 1 e~ t dt. Thus the problem to de- 
termine the values of the free parameters (k, a, j3) of the theory from the empirical data 
reduces to a two parameter (k, a) fitting problem. 



III. AN APPLICATION TO PERSONAL INCOME DATA 

As a working example, we analyze the census data on personal income in three countries: 
Germany, Italy, and the United Kingdom. 1 



See Refs. for analysis referring to the same countries and data 



sources. 
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(a) CCDF for some different values of k 



(b) PDF for some different values of re 



FIG. 3: (a) Plot of the re-generalized CCDF given by Eq. (jILip versus x for some different values of 
re (= 0.00,0.30,0.50,0.80), and fixed (3 (= 0.20) and a (= 2.50). [(b)]Plot of the re-generalized PDF 
given by Eq. (|II.4[) versus x for some different values of re (= 0.00,0.30,0.50,0.80), and fixed (5 
(= 0.20) and a (= 2.50). Notice that the upper tail of the distribution fattens (thins) as the value 
of re increases (decreases). The case re = 0.00 corresponds to the ordinary stretched exponential 



(Weibull) function 



26, 



22fl 



The data used are drawn primarily from the Cross-National Equivalent File (CNEF) 
1980-2002, a commercially available dataset compiled by researchers at Cornell University 
which attempts to make comparable, among others, the following panel surveys: the German 
Socio-Economic Panel (GSOEP) and the British Household Panel Study (BHPS). 2 The 
income variable we use is the post-government income, representing the combined income 
after taxes and government transfers of the head, partner, and other family members. 

For Italy, which is not part of the CNEF, we use the Survey on Household Income and 
Wealth (SHIW), a household-based panel study carried out by the Bank of Italy since 1977. 
In place of the post-government income in the CNEF, we use the net disposable income 
variable from the survey above — i.e., the income recorded after the payment of taxes and 
social security contributions, defined as the sum of four main components: compensation 
of employees, pensions and net transfers, net income from self-employment, and property 

2 For background on the CNEF, see Ref. [3] or consult the CNEF homepage at the following web address: 
http : //www . human . Cornell . edu/ che/PAM/Research/Centers-Programs/German-Panel/cnef . cf m 
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TABLE I: Estimated parameters (with 95% confidence intervals half-widths) of the K-generalized 
distribution for the countries and years shown in FIGS. [HO and[6l Also shown is the estimated 
weighted average income 





Germany 


Italy 


United Kingdom 


K 


0.5697 ± 0.0005 


0.6944 ± 0.0006 


0.7080 ± 0.0006 


a 


2.5659 ± 0.0007 


2.2540 ± 0.0007 


2.7357 ± 0.0009 





0.8788 ± 0.0003 


1.0087 ± 0.0004 


0.9433 ± 0.0004 




36315.67 ± 339.24 


18087.92 ± 246.85 


14982.20 ± 183.09 



income. 3 



The results obtained by fitting our theoretical model through the observational data are 
reported in TABLE [J and FIGS. IH [51 and[6j 4 Panel (a) of the figures shows the empirical 
cumulative distribution estimate 5 of x along with three different curves in the log-log scale: 
the ^-generalized distribution, Eq. flll.lj) ; the ordinary stretched exponential (Weibull) 



3 For a comprehensive discussion of the dataset, see Ref. 30]; the 
data are available for free download at the following web address: 
http : //www.bancaditalia. it/statistiche/ ibf /statist iche/ ibf /microdati/dati/emirchivio .htm. 

4 To find the parameter values that give the most desirable fit, we have used the Constrained Maximum 
Likelihood (CML) estimation method [3l|, which solves the weighted maximum log-likelihood problem 

n 

where n is the number of observations, Wj is the survey weight accommodating features of the sample 
design and the population structure j32| , p (xj ; 9) is the probability of Xi given 9, the vector of parameters, 
subject to the non-linear equality constraint given by Eq. ()II.8I) and bounds a, (3 > and k S [0, 1). The 
CML procedure finds values for the parameters in 9 such that I (x; 9) is maximized using the Sequential 
Quadratic Programming (SQP) method \^ as implemented in Matlab® 7. We have then calculated the 
approximate 95% confidence interval half- width around each parameter by using the normal approximation 

6±Zi-& ■ ere, 

where ag denotes the estimate standard error — obtained from a finite difference approximation to the 
asymptotic covariance matrix of the maximum likelihood estimators of the parameters, and is 
defined such that $ [zi-m ) = 1 — |- , being $ (•) the standard normal distribution function. The overall 
analysis uses a simple equivalence scale adjusting income by the square root of the number of household 
members to account for differences in household size and composition. 

5 The empirical cumulative distribution is equal to the normalized sum of the survey weights of the indi- 
viduals with incomes above x. 
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Germany - Income 2001 



Germany - Income 2001 



k = 0.5697 ±0.0005 
a = 2.5659 ± 0.0007 
S = 0.8788 ± 0.0003 



cCio-" 



io- 3 




- exp„(— 8x a 

al3x a - 1 exp (-f3x a ) 



(2(3K,y 1/K x - a/K 
exp,. (—f3x a ) 
exp (— f3x") 



io 1 




n = 0.5697 ± 0.0005 
a = 2.5659 ± 0.0007 
/3 = 0.8788 ± 0.0003 



(a) Complementary CDF (b) PDF histogram plot 

FIG. 4: The German personal income distribution from the 2001 GSOEP-CNEF data file measured 



in current year euros, (a) Plot of the empirical CCDF versus income in the log-log scale. The solid 



line is our theoretical model given by Eq. (hTTj) with k = 0.5697 ± 0.0005, a = 2.5659 ± 0.0007, 
and (3 = 0.8788 ± 0.0003, which fits very well the data in the whole range from the low to the high 
incomes including the intermediate income region. This function is compared with the ordinary 
stretched exponential one (dotted line) — fitting the low income data — and with the pure power-law 
(dashed line) — fitting the high income data — by using the same parameter values. |(b)| Histogram 
plot of the empirical PDF with superimposed fits of the K-generalized (solid line) and stretched 



exponential (dotted line) PDFs using the same parameter values as in panel (a) . The income axis 



limits have been adjusted according to the range of data to shed light on the intermediate region 
between the bulk and the tail of the distribution 



distribution, Eq. (111.21) ; the pure power-law distribution, Eq. (III.3I) . In panel (b), the 
histogram of the reconstructed probability density 6 is contrasted to the theoretical curves 
corresponding to Eqs. ( 111.41) and ( 111.51) with the same parameter values as in TABLE [J and 
panel (a) of FIGS. HI [51 and [61 It is clear that the ^-generalized distribution offers a great 
potential for describing the data over their whole range, from the low to medium income 

6 In order to estimate the empirical probability density, we divide the income axis into bins of width Ax, 
calculate the sum of the survey weights of the individuals with incomes from x to x + Ax, and plot the 
obtained histogram. 
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Italy - Income 2002 Ital y - Income 2002 




(a) Complementary CDF (b) PDF histogram plot 



FIG. 5: Same plots as in FIG. 0] for the Italian personal income distribution from the 2002 SHIW 
data file with « = 0.6944 ± 0.0006, a = 2.2540 ± 0.0007, and p = 1.0087 ± 0.0004. The income 
variable is measured in current year euros 

region through to the high income Pareto power-law regime, including the intermediate 
region for which a clear deviation exists when two different curves are used. 7 

IV. FINAL REMARKS 

Since the early study of Pareto, numerous recent empirical works have all shown that 
the power-law tail is an ubiquitous feature of income distributions. However, even 100 years 
after Pareto's observation, the understanding of the shape of income distribution is still far 
to be complete and definitive. This reflects the fact that there are two distributions, one for 

7 Pareto's contribution has also stimulated further research on the specification of new models 

to fit the whole range of income — the interested reader is referred to the review in Ref. [34| and the 
bibliography therein for an exhaustive list of personal income distributions and their basic properties. 
Weibull, gamma, beta, Dagum, Singh-Maddala, Fisk, Lomax, Pareto-Levy, Champernowne — just to name 
a few distributions many of which are special or limiting cases of more general parametric families, such 
as the generalized gamma distribution and the (generalized) beta distribution of the second kind — have 
all been used as descriptive models for the overall distribution of income. Although we are well aware of 
the existence of this numerous body of income distributions for which our work could ultimately result in 
duplication of effort, our main goal in this field is to concentrate on the opportunity of transposing the 
tools, methods and concepts from statistical mechanics to economics. 
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United Kingdom - Income 2001 Vnited Kingdom - Income 2001 




(a) Complementary CDF (b) PDF histogram plot 

FIG. 6: Same plots as in FIGS. S] and for the UK personal income distribution from the 2001 
BHPS-CNEF data file with k = 0.7080 ± 0.0006, a = 2.7357 ± 0.0009, and /3 = 0.9433 ± 0.0004. 
The income variable is measured in current year British pounds 

the rich, following the Pareto's law, and one for the vast majority of people, which appears 
to be governed by a completely different law. 

In the present work we have affirmed support for a new fitting function, having its roots 
in the framework of ^-generalized statistical mechanics, which shows to be able to describe 
the data over the entire range, including even the power-law tail. This distribution has a 
bulk very close to the stretched exponential one — which is recovered when the deformation 
parameter k approaches to zero — while its tail decays following a power-law for high values 
of income, thus providing a kind of compromise between the two description. 

The good concordance of our generalized statistical distribution with observational data 
on personal income may suggest a new path for investigating economic relations, namely 
the development of models within the framework of ^-generalized statistical mechanics. 
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